{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Smart Product Buyer AI Agent\n",
    "\n",
    "## Overview\n",
    "\n",
    "This notebook details the **Smart Product Buyer AI Agent**, developed as a **Proof of Concept (PoC)** to assist users in making informed buying decisions. While the current implementation focuses on car purchasing, it is designed to be **easily extendable** to support additional websites and even other product categories. The project leverages **LangGraph** and **LLM-based intelligence** to provide an interactive, efficient, and adaptable user experience.\n",
    "\n",
    "## Detailed Explanation\n",
    "\n",
    "### Motivation\n",
    "Modern consumers face challenges navigating the vast array of product options online. This agent streamlines the search and decision-making process by:\n",
    "- Understanding user needs and preferences.\n",
    "- Refining and applying complex filters across multiple platforms. For now, it only supports AutoTrader, but it can be extended to other platforms easily by adding a new scraper in the `scrapers` folder.\n",
    "- Providing actionable insights and recommendations.\n",
    "\n",
    "### Key Components\n",
    "1. **User Input Processing**: Understands user requirements and preferences dynamically using LLM-powered interactions.\n",
    "2. **Filter Refinement**: Tailors search filters to match user-defined parameters.\n",
    "3. **Web Scraping and Integration**: Interfaces with platforms like AutoTrader to fetch and present relevant listings.\n",
    "4. **Summarization and Insights**: Provides concise summaries and insights into listings, including general market reliability.\n",
    "\n",
    "### Agent Architecture\n",
    "The agent follows a structured workflow:\n",
    "- **User Need Assessment**: Gathers and summarizes user preferences.\n",
    "- **Filter Building**: Constructs and applies search filters.\n",
    "- **Listing Retrieval**: Collects data from integrated platforms.\n",
    "- **Insight Delivery**: Provides additional information and recommendations.\n",
    "\n",
    "### Benefits\n",
    "- **Efficiency**: Reduces the time spent searching and comparing products.\n",
    "- **Clarity**: Summarizes complex data into actionable insights.\n",
    "- **Flexibility**: Adaptable to various product categories beyond cars.\n",
    "\n",
    "## Visual Representation\n",
    "\n",
    "Below is the diagram of the agent's architecture:\n",
    "\n",
    "![Smart Product Buyer Agent Architecture](../images/car_buyer_agent_langgraph.png)\n",
    "\n",
    "---\n",
    "\n",
    "## Code Setup\n",
    "\n",
    "The following steps guide you through setting up the necessary environment and running the agent.\n",
    "\n",
    "### Prerequisites\n",
    "Ensure you have Python and Jupyter Notebook installed on your system.\n",
    "\n",
    "The project can run on Google Colab or any local Jupyter Notebook environment, but for some reason scraping is very slow on Google Colab.\n",
    "We recommend running the project on a local Jupyter Notebook environment, preferably on macOS or Linux. If you're using Windows, it's best to run it under WSL for optimal performance.\n",
    "\n",
    "To start the Gradio interface, just run all the cells in the notebook, then connect to the Gradio interface by clicking the link provided.\n",
    "\n",
    "You can set USE_GRADIO variable to False to run the project without Gradio interface. This makes it easier to debug and test the project.\n",
    "\n",
    "Set up the .env file with the necessary API keys:\n",
    "- OPENAI_API_KEY (required)\n",
    "- LANGCHAIN_API_KEY (not required if LangSmith is not used)\n",
    "\n",
    "## About the Team\n",
    "\n",
    "The **Smart Product Buyer AI Agent** was created by **Aurore Pistono**, **Clément Florval**, and **Louis Gauthier**, all members of the **[Digiwave](https://dgwave.net)** team. Together, we bring expertise in AI, innovative development strategies, and a passion for creating impactful technological solutions.\n",
    "\n",
    "### Connect with the Team:\n",
    "- [Aurore Pistono on LinkedIn](https://www.linkedin.com/in/aurore-pistono/)\n",
    "- [Clément Florval on LinkedIn](https://www.linkedin.com/in/clement-florval/)\n",
    "- [Louis Gauthier on LinkedIn](https://www.linkedin.com/in/louis-gthier/)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Install Required Libraries\n",
    "\n",
    "This cell installs all the necessary Python packages required for the project. Below is a description of each package:\n",
    "\n",
    "1. **`langgraph`**:\n",
    "   - Provides tools for building and managing state-based workflows, particularly useful for conversational agents.\n",
    "\n",
    "2. **`langchain` and `langchain-openai`**:\n",
    "   - Frameworks for developing applications powered by Language Models (LLMs). `langchain-openai` is specifically tailored for OpenAI's APIs.\n",
    "\n",
    "3. **`langchain-community`**:\n",
    "   - A community-driven package offering additional tools and integrations for LangChain.\n",
    "\n",
    "4. **`importnb`**:\n",
    "   - Enables the import of Jupyter Notebooks as Python modules.\n",
    "\n",
    "5. **`python-dotenv`**:\n",
    "   - Manages environment variables stored in a `.env` file, providing secure access to sensitive data like API keys.\n",
    "\n",
    "6. **`patchright`**:\n",
    "   - Used for patching and managing updates for certain libraries or configurations.\n",
    "\n",
    "7. **`lxml`**:\n",
    "   - A powerful library for parsing and working with XML and HTML documents, often used in web scraping.\n",
    "\n",
    "8. **`nest_asyncio`**:\n",
    "   - Allows running asynchronous event loops within Jupyter Notebooks, resolving conflicts caused by its built-in event loop.\n",
    "\n",
    "9. **`playwright`**:\n",
    "   - A library for browser automation, used for scraping or testing web applications.\n",
    "\n",
    "10. **`duckduckgo-search`**:\n",
    "    - Provides programmatic access to DuckDuckGo search results for retrieving web-based information.\n",
    "\n",
    "11. **`gradio`**:\n",
    "    - A framework for building user-friendly web-based interfaces, commonly used for showcasing AI applications."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install langgraph\n",
    "%pip install langchain\n",
    "%pip install langchain-openai\n",
    "%pip install langchain-community\n",
    "%pip install importnb\n",
    "%pip install python-dotenv\n",
    "%pip install patchright\n",
    "%pip install lxml\n",
    "%pip install nest_asyncio\n",
    "%pip install playwright\n",
    "%pip install duckduckgo-search\n",
    "%pip install gradio"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Import Necessary Libraries\n",
    "\n",
    "This cell imports the required libraries and modules for building and managing the workflow, integrating OpenAI's APIs, handling asynchronous operations, and working with custom scrapers."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import necessary libraries\n",
    "from typing import TypedDict, Dict, List, Any\n",
    "from langgraph.graph import StateGraph, END, START, MessagesState\n",
    "from langchain_openai import ChatOpenAI\n",
    "from IPython.display import display, Image\n",
    "from langchain_core.runnables.graph import MermaidDrawMethod\n",
    "from langchain.tools import DuckDuckGoSearchResults\n",
    "from langchain_core.messages import SystemMessage, HumanMessage, AIMessage\n",
    "\n",
    "from langsmith import Client\n",
    "from langsmith import traceable\n",
    "from langsmith.wrappers import wrap_openai\n",
    "import openai\n",
    "import asyncio\n",
    "from importnb import Notebook\n",
    "import time\n",
    "\n",
    "import os\n",
    "from dotenv import load_dotenv\n",
    "\n",
    "# For scraping\n",
    "from patchright.async_api import async_playwright\n",
    "from lxml import html\n",
    "from abc import ABC, abstractmethod\n",
    "import re\n",
    "\n",
    "# This import is required only for jupyter notebooks, since they have their own eventloop\n",
    "import nest_asyncio\n",
    "nest_asyncio.apply()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Web Scraping and Interface Definitions\n",
    "\n",
    "This section defines essential functions and classes for interacting with websites, retrieving listings, and applying filters based on user requirements.\n",
    "\n",
    "1. **`scroll_to_bottom`**:\n",
    "   - Implements dynamic content loading by scrolling to the bottom of a webpage iteratively, ensuring all elements are loaded before scraping.\n",
    "\n",
    "2. **`block_unnecessary_resources`**:\n",
    "   - Improves scraping efficiency by blocking non-essential resources such as images during browser automation.\n",
    "\n",
    "3. **`WebsiteInterface` Abstract Class**:\n",
    "   - Serves as a base class for defining web scraper interfaces.\n",
    "   - Provides structure for crawling websites and managing filters, ensuring consistency across multiple platforms.\n",
    "\n",
    "4. **`AutotraderInterface`**:\n",
    "   - A concrete implementation of the `WebsiteInterface` tailored for scraping car listings from AutoTrader.\n",
    "   - Includes methods for:\n",
    "     - Retrieving and processing car listings (`crawl`).\n",
    "     - Fetching detailed information for a specific listing (`crawl_listing`).\n",
    "     - Constructing filters and generating query URLs dynamically using LLM responses.\n",
    "\n",
    "The modular design allows for easy addition of new platforms by extending the `WebsiteInterface` class and implementing the required methods.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "async def scroll_to_bottom(page, scroll_delay=0.1):\n",
    "    \"\"\"\n",
    "    Scroll to the bottom of the page iteratively, with delays to ensure dynamic content is fully loaded.\n",
    "    \n",
    "    Args:\n",
    "        page: The Playwright page instance.\n",
    "        scroll_delay: Delay in seconds between scrolls to allow content loading.\n",
    "    \"\"\"\n",
    "    \n",
    "    print(\"Scrolling through the page...\")\n",
    "    \n",
    "    scroll_size = 2160\n",
    "\n",
    "    next_scroll = scroll_size\n",
    "    for i in range(3):\n",
    "        # Scroll 500 pixels at a time\n",
    "        await page.evaluate(f\"window.scrollTo(0, {next_scroll})\")\n",
    "\n",
    "        next_scroll += scroll_size\n",
    "\n",
    "        # Wait for content to load\n",
    "        await asyncio.sleep(scroll_delay)\n",
    "        \n",
    "    print(\"Finished scrolling through the page.\")\n",
    "\n",
    "async def block_unnecessary_resources(route):\n",
    "    if route.request.resource_type in [\"image\"]:\n",
    "        await route.abort()\n",
    "    else:\n",
    "        await route.continue_()\n",
    "        \n",
    "class WebsiteInterface(ABC):\n",
    "    def __init__(self):\n",
    "        self.base_url = \"\"\n",
    "        \n",
    "    @abstractmethod\n",
    "    async def crawl(self) -> List[Dict[str, str]]:\n",
    "        \"\"\"\n",
    "        Abstract method to crawl the website and extract listings.\n",
    "        Must be implemented by subclasses.\n",
    "        \"\"\"\n",
    "        pass\n",
    "\n",
    "    @abstractmethod\n",
    "    def get_filters_info(self) -> str:\n",
    "        \"\"\"\n",
    "        Abstract method to return a prompt for the LLM describing the filters and expected output format.\n",
    "        Must be implemented by subclasses.\n",
    "        \"\"\"\n",
    "        pass\n",
    "\n",
    "    @abstractmethod\n",
    "    def set_filters_from_llm_response(self, llm_response: str):\n",
    "        \"\"\"\n",
    "        Abstract method to process the LLM's response and set the URL with appropriate filters.\n",
    "        Must be implemented by subclasses.\n",
    "        \"\"\"\n",
    "        pass\n",
    "\n",
    "class AutotraderInterface(WebsiteInterface):\n",
    "    def __init__(self):\n",
    "        self.base_url = \"https://www.autotrader.com/cars-for-sale/all-cars\"\n",
    "        # https://www.autotrader.com/cars-for-sale/all-cars/floral-park-ny?endYear=2022&makeCode=BMW&makeCode=FORD&newSearch=true&startYear=2012&zip=11001\n",
    "        \n",
    "    async def crawl(self) -> List[Dict[str, str]]:\n",
    "        listings = []\n",
    "        \n",
    "        url = self.url\n",
    "\n",
    "        playwright = await async_playwright().start()\n",
    "\n",
    "        # Launch browser in headless mode\n",
    "        browser = await playwright.chromium.launch(headless=True,\n",
    "                                                    args=[\n",
    "                                                            \"--no-sandbox\",\n",
    "                                                            \"--disable-setuid-sandbox\",\n",
    "                                                            \"--disable-dev-shm-usage\",\n",
    "                                                            \"--disable-extensions\",\n",
    "                                                            \"--disable-gpu\"\n",
    "                                                    ]\n",
    "                                                    )\n",
    "        \n",
    "        context = await browser.new_context(\n",
    "            user_agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36',\n",
    "            viewport={\"width\": 1920, \"height\": 1080},\n",
    "            # no_viewport=True\n",
    "            locale=\"en-US\",\n",
    "            timezone_id=\"America/New_York\",\n",
    "            # java_script_enabled=False,\n",
    "        )\n",
    "\n",
    "\n",
    "        print(\"Opening browser page\")\n",
    "\n",
    "        page = await context.new_page()\n",
    "        \n",
    "        await page.route(\"**/*\", block_unnecessary_resources)\n",
    "\n",
    "        print(\"Loading page\")\n",
    "        \n",
    "        await page.goto(url, wait_until=\"domcontentloaded\")\n",
    "\n",
    "        print(\"Page partially loaded. Starting to scroll.\")\n",
    "        \n",
    "        # Scroll to the bottom of the page\n",
    "        await scroll_to_bottom(page)\n",
    "        \n",
    "        page_content = await page.content()\n",
    "        \n",
    "        # Parse HTML using lxml\n",
    "        tree = html.fromstring(page_content)\n",
    "\n",
    "        # XPath to select each car listing container\n",
    "        listings_elements = tree.xpath('//div[@data-cmp=\"inventoryListing\"]')\n",
    "\n",
    "        listings = []\n",
    "\n",
    "        for listing in listings_elements:\n",
    "            car_data = {}\n",
    "            # Extract car details\n",
    "            car_data['title'] = listing.xpath('.//h2[@data-cmp=\"subheading\"]/text()')\n",
    "            car_data['mileage'] = listing.xpath('.//div[@data-cmp=\"mileageSpecification\"]/text()')\n",
    "            car_data['price'] = listing.xpath('.//div[@data-cmp=\"firstPrice\"]/text()')\n",
    "            car_data['dealer'] = listing.xpath('.//div[@class=\"text-subdued\"]/text()')\n",
    "            car_data['phone'] = listing.xpath('.//span[@data-cmp=\"phoneNumber\"]/text()')\n",
    "            car_data['url'] = listing.xpath('.//a[@data-cmp=\"link\"]/@href')\n",
    "            car_data['image'] = listing.xpath('.//img[@data-cmp=\"inventoryImage\"]/@src')\n",
    "            \n",
    "            # Clean up extracted data\n",
    "            car_data = {key: (val[0].strip() if val else None) for key, val in car_data.items()}\n",
    "            \n",
    "            car_data['url'] = car_data['url'].split('?')[0]\n",
    "            \n",
    "            # Add domain to the URL. Extract domain from the base URL without the path\n",
    "            car_data['url'] = re.sub(r'^(https?://[^/]+).*$', r'\\1', self.base_url) + car_data['url']\n",
    "            \n",
    "            # Set the ID of the listing as the ID of the WebsiteInterface and the car number from URL\n",
    "            car_data = { \"id\": f\"{self.__class__.__name__}_{car_data['url'].split('/')[-1]}\" } | car_data\n",
    "            \n",
    "            listings.append(car_data)\n",
    "            \n",
    "        if __name__ == \"__main__\":\n",
    "            print(\"Found the following car listings:\")\n",
    "            # Display the extracted data\n",
    "            for car in listings:\n",
    "                print(car)\n",
    "\n",
    "        print(\"Found\", len(listings), \"listings\")\n",
    "\n",
    "        await browser.close()\n",
    "        \n",
    "        return listings\n",
    "    \n",
    "    async def crawl_listing(self, listing_url) -> List[Dict[str, str]]:\n",
    "        listing_info = \"\"\n",
    "        \n",
    "        url = listing_url\n",
    "\n",
    "        playwright = await async_playwright().start()\n",
    "\n",
    "        # Launch browser in headless mode\n",
    "        browser = await playwright.chromium.launch(headless=True,\n",
    "                                                    args=[\n",
    "                                                            \"--no-sandbox\",\n",
    "                                                            \"--disable-setuid-sandbox\",\n",
    "                                                            \"--disable-dev-shm-usage\",\n",
    "                                                            \"--disable-extensions\",\n",
    "                                                            \"--disable-gpu\"\n",
    "                                                    ]\n",
    "                                                    )\n",
    "        \n",
    "        context = await browser.new_context(\n",
    "            user_agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36',\n",
    "            viewport={\"width\": 1920, \"height\": 1080},\n",
    "            # no_viewport=True\n",
    "            locale=\"en-US\",\n",
    "            timezone_id=\"America/New_York\",\n",
    "            # java_script_enabled=False,\n",
    "        )\n",
    "\n",
    "\n",
    "        print(\"Opening browser page\")\n",
    "\n",
    "        page = await context.new_page()\n",
    "        \n",
    "        await page.route(\"**/*\", block_unnecessary_resources)\n",
    "\n",
    "        print(\"Loading page\")\n",
    "        \n",
    "        await page.goto(url, wait_until=\"domcontentloaded\")\n",
    "\n",
    "        print(\"Page partially loaded. Starting to scroll.\")\n",
    "\n",
    "        # Scroll to the bottom of the page\n",
    "        await scroll_to_bottom(page)\n",
    "\n",
    "        # Get full HTML\n",
    "        page_content = await page.content()\n",
    "\n",
    "        # Parse HTML using lxml to extract all the text\n",
    "        tree = html.fromstring(page_content)\n",
    "        listing_info = tree.xpath(\"//div[contains(@class, 'container') and contains(@class, 'margin-top-5')]/div[contains(@class, 'row')]//text()\")\n",
    "        listing_info = \"\\t\".join(listing_info).strip()\n",
    "\n",
    "        # Seller information should already be included in the listing information\n",
    "        # seller_info = tree.xpath(\"//div[@id='sellerComments']//text()\")\n",
    "        # listing_info = listing_info + seller_info\n",
    "            \n",
    "        if __name__ == \"__main__\":\n",
    "            print(\"Found the following information:\")\n",
    "            # Print the extracted text\n",
    "            print(listing_info)\n",
    "\n",
    "        await browser.close()\n",
    "        \n",
    "        return listing_info\n",
    "    \n",
    "    def get_filters_info(self) -> str:\n",
    "        \"\"\"\n",
    "        Return a prompt for the LLM describing the filters and expected output format.\n",
    "        \"\"\"\n",
    "        return f\"\"\"\n",
    "        You are a helpful assistant that translates user requirements into a URL with query parameters.\n",
    "\n",
    "        The base URL is: {self.base_url}\n",
    "        Filters:\n",
    "        - zip: User's zip code (integer).\n",
    "        - searchRadius: Search radius in miles (integer, e.g., 75, 100, 200).\n",
    "        - startYear: Minimum year of the car (integer).\n",
    "        - endYear: Maximum year of the car (integer).\n",
    "        - makeCode: Car manufacturer code (string, can appear multiple times, e.g., \"BMW\", \"FORD\").\n",
    "        - listingType: Type of listing (one of \"NEW\", \"USED\", \"CERTIFIED\", \"3P_CERT\").\n",
    "        - mileage: Maximum mileage of the car (integer).\n",
    "        - driveGroup: Type of drive (one of \"AWD4WD\", \"FWD\", \"RWD\").\n",
    "        - extColorSimple: External color of the car (e.g., \"BLACK\", \"WHITE\", \"RED\", \"GRAY\").\n",
    "        - intColorSimple: Internal color of the car (e.g., \"BEIGE\", \"BLACK\", \"BLUE\").\n",
    "        - mpgRange: Fuel efficiency in miles per gallon (e.g., \"30-MPG\").\n",
    "        - fuelTypeGroup: Type of fuel (one of \"GSL\", \"DSL\", \"HYB\", \"ELE\", \"PIH\").\n",
    "        - bodyStyleSubtypeCode: Type of body style (e.g., \"FULLSIZE_CREW\", \"COMPACT_EXTEND\").\n",
    "        - truckBedLength: Truck bed length (e.g., \"SHORT\", \"EXTRA SHORT\", \"UNSPECIFIED\").\n",
    "        - vehicleStyleCode: Vehicle style (e.g., \"CONVERT\", \"WAGON\", \"HATCH\", \"SUVCROSS\").\n",
    "        - dealType: Type of deal (e.g., \"goodprice\", \"greatprice\").\n",
    "        - doorCode: Number of doors (e.g., \"2\", \"3\", \"4\").\n",
    "        - engineDisplacement: Engine size range in liters (e.g., \"1.0-1.9\", \"2.0-2.9\").\n",
    "        - featureCode: Specific features of the car (e.g., \"1062\" for heated seats, \"1327\" for navigation).\n",
    "        - transmissionCode: Transmission type (e.g., \"AUT\" for automatic, \"MAN\" for manual).\n",
    "        - vehicleHistoryType: Vehicle history (e.g., \"NO_ACCIDENTS\", \"ONE_OWNER\", \"CLEAN_TITLE\").\n",
    "        - newSearch: Boolean to indicate a new search (e.g., \"true\").\n",
    "        - sortBy: Sorting option for the results (optional). \n",
    "            Options:\n",
    "            - \"relevance\" (default): Sort by relevance.\n",
    "            - \"derivedpriceASC\": Sort by price, lowest to highest.\n",
    "            - \"derivedpriceDESC\": Sort by price, highest to lowest.\n",
    "            - \"distanceASC\": Sort by distance, closest to farthest.\n",
    "            - \"datelistedASC\": Sort by date, oldest first.\n",
    "            - \"datelistedDESC\": Sort by date, newest first.\n",
    "            - \"mileageASC\": Sort by mileage, lowest to highest.\n",
    "            - \"mileageDESC\": Sort by mileage, highest to lowest.\n",
    "            - \"yearASC\": Sort by year, oldest to newest.\n",
    "            - \"yearDESC\": Sort by year, newest to oldest.\n",
    "        \n",
    "        Special filters:\n",
    "        - price: Price is embedded in the path of the URL, e.g., \"/cars-over-45000\" or \"/cars-between-10000-and-20000\".\n",
    "\n",
    "        Example Output:\n",
    "        A complete URL with query parameters, e.g.,:\n",
    "        \"{self.base_url}/cars-between-10000-and-20000?zip=10001&startYear=2010&endYear=2020&makeCode=BMW&makeCode=FORD&listingType=USED&mileage=50000&fuelTypeGroup=GSL&intColorSimple=BLACK&vehicleHistoryType=NO_ACCIDENTS\"\n",
    "\n",
    "        Based on the user's needs, format the response as only the complete URL (no extra explanations). The URL is an example, don't include filters if they are not needed by the user.\n",
    "        \"\"\"\n",
    "        \n",
    "    def set_filters_from_llm_response(self, llm_response: str):\n",
    "        \"\"\"\n",
    "        Process the LLM's response and set the URL with the provided parameters.\n",
    "        \"\"\"\n",
    "        # Validate and set the URL from LLM's response\n",
    "        if llm_response.startswith(self.base_url):\n",
    "            self.url = llm_response.strip()\n",
    "        else:\n",
    "            raise ValueError(\"Invalid URL format provided by LLM response: \" + llm_response)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Set Up Playwright Dependencies\n",
    "\n",
    "This cell installs the necessary dependencies for **Playwright**, a library used for browser automation, and configures the Chromium browser. It performs the following tasks:\n",
    "\n",
    "1. **Install Playwright dependencies**:\n",
    "   - Ensures that system-level dependencies required by Playwright are installed.\n",
    "\n",
    "2. **Install Playwright browsers**:\n",
    "   - Downloads and sets up the necessary browsers for Playwright to work, including Chromium.\n",
    "\n",
    "3. **Patch Chromium**:\n",
    "   - Installs any required patches for the Chromium browser to ensure compatibility with Playwright.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!playwright install-deps\n",
    "!playwright install\n",
    "!patchright install chromium"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Load Environment Variables and Set Up API Keys\n",
    "\n",
    "This cell configures the environment and initializes API keys required for the project:\n",
    "\n",
    "1. **Load `.env` Variables**:\n",
    "   - Uses `load_dotenv()` to load sensitive information (like API keys) from a `.env` file.\n",
    "\n",
    "2. **Configure OpenAI API Key**:\n",
    "   - Retrieves the `OPENAI_API_KEY` from the environment or Colab's `userdata` if running in Google Colab.\n",
    "\n",
    "3. **Set LangChain Configuration**:\n",
    "   - Disables tracing (`LANGCHAIN_TRACING_V2`) and configures the LangChain endpoint and project.\n",
    "\n",
    "4. **Initialize Clients**:\n",
    "   - Sets up `GPT` as the model (`gpt-4o-mini`) using `ChatOpenAI`.\n",
    "   - Creates `langsmith_client` and `openai_client` for managing OpenAI interactions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load environment variables\n",
    "load_dotenv()\n",
    "\n",
    "try:\n",
    "  from google.colab import userdata\n",
    "  os.environ[\"OPENAI_API_KEY\"] = os.getenv('OPENAI_API_KEY', userdata.get('OPENAI_API_KEY'))\n",
    "except:\n",
    "  os.environ[\"OPENAI_API_KEY\"] = os.getenv('OPENAI_API_KEY')\n",
    "  \n",
    "os.environ[\"LANGCHAIN_TRACING_V2\"] = \"false\"\n",
    "os.environ[\"LANGCHAIN_ENDPOINT\"] = \"https://api.smith.langchain.com\"\n",
    "os.environ[\"LANGCHAIN_PROJECT\"] = \"car_buyer_agent\"\n",
    "os.environ[\"LANGCHAIN_API_KEY\"] = os.getenv('LANGCHAIN_API_KEY', \"\")\n",
    "\n",
    "GPT = ChatOpenAI(model=\"gpt-4o-mini\")\n",
    "\n",
    "langsmith_client = Client()\n",
    "openai_client = wrap_openai(openai.Client())\n",
    "\n",
    "# search = DuckDuckGoSearchResults()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Define the `State` Class\n",
    "\n",
    "This cell defines the `State` class, which inherits from `MessagesState`, to represent the current state of the car-buying process. It includes:\n",
    "\n",
    "1. **`user_needs`**: Stores the user's requirements for the car (e.g., budget, features).\n",
    "2. **`web_interfaces`**: A list of web scraper interfaces (e.g., AutoTrader) to fetch car listings.\n",
    "3. **`listings`**: A collection of car listings retrieved from the web platforms.\n",
    "4. **`selected_listing`**: The specific car listing chosen by the user for further exploration.\n",
    "5. **`additional_info`**: Additional information about the selected car (e.g., common issues, reliability).\n",
    "6. **`next_node`**: The next action or state transition in the workflow."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "class State(MessagesState):\n",
    "    \"\"\"Represents the state of the car-buying process.\"\"\"\n",
    "    user_needs: str\n",
    "    web_interfaces: List[WebsiteInterface]\n",
    "    listings: List[Dict[str, str]]\n",
    "    selected_listing: Dict[str, str]\n",
    "    additional_info: Dict[str, str]\n",
    "    next_node: str\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Define Input and Output Helper Functions\n",
    "\n",
    "1. **`get_user_input`**:\n",
    "   - A utility function to capture user input during the interaction.\n",
    "   - It wraps Python’s `input()` function, allowing for optional arguments (`*args`, `**kwargs`) for flexibility in prompt customization.\n",
    "\n",
    "2. **`show_assistant_output`**:\n",
    "   - Displays the assistant's output (e.g., LLM responses) to the user.\n",
    "   - Uses Python's `print()` function, enabling formatted or contextual responses.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_user_input(*args, **kwargs):\n",
    "    \"\"\"Get user input.\"\"\"\n",
    "    return input(*args, **kwargs)\n",
    "\n",
    "def show_assistant_output(*args, **kwargs):\n",
    "    \"\"\"Show the output of the LLM.\"\"\"\n",
    "    print(*args, **kwargs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Define the `ask_user_needs` Function\n",
    "\n",
    "This function initiates the process of gathering user requirements for the car-buying assistant. It uses LLM responses to guide the conversation and determine the next steps. Key components include:\n",
    "\n",
    "1. **State Initialization**:\n",
    "   - Retrieves previous messages and existing user needs from the `state` object.\n",
    "\n",
    "2. **Conversation Starter**:\n",
    "   - Constructs a system message to ask the user about their car requirements (e.g., budget, usage, preferences).\n",
    "\n",
    "3. **Interaction Handling**:\n",
    "   - Appends the assistant's and user's messages to the conversation flow using `SystemMessage`, `AIMessage`, and `HumanMessage`.\n",
    "\n",
    "4. **Summarization**:\n",
    "   - Summarizes user input into concise points and determines the next step in the workflow:\n",
    "     - `ask_user_needs`: Collect more details.\n",
    "     - `build_filters`: Proceed to filtering options.\n",
    "     - `irrelevant`: Handle unrelated queries.\n",
    "\n",
    "5. **LLM Integration**:\n",
    "   - Uses `USER_NEEDS_GPT` (configured with a custom response format) to process user needs and suggest the next action.\n",
    "\n",
    "6. **Output**:\n",
    "   - Displays summarized needs and the determined next step to the user."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from typing import TypedDict\n",
    "from enum import Enum\n",
    "from pydantic import BaseModel\n",
    "import json\n",
    "\n",
    "class NextStep(Enum):\n",
    "    ASK_USER_NEEDS = \"ask_user_needs\"\n",
    "    BUILD_FILTERS = \"build_filters\"\n",
    "    IRRELEVANT = \"irrelevant\"\n",
    "\n",
    "class UserNeeds(BaseModel):\n",
    "    user_needs: str\n",
    "    next_step: NextStep\n",
    "\n",
    "USER_NEEDS_GPT = ChatOpenAI(model=\"gpt-4o-mini\", response_format=UserNeeds)\n",
    "\n",
    "def ask_user_needs(state: State) -> State:\n",
    "    \"\"\"Ask user initial questions to define their needs for the car.\"\"\"\n",
    "    messages = state.get(\"messages\", [])    \n",
    "    if len(messages) == 0:\n",
    "        system_message = \"You are a car buying assistant. Your goal is to help the user find a car that meets their needs. Start by introducing yourself and asking about their requirements, such as intended usage (e.g., commuting, family trips), budget, size preferences, and any specific constraints or features they value. Use their responses to guide them toward the best options.\"\n",
    "    else:\n",
    "        system_message = \"Ask the user for any additional information that can help narrow down the search. If he asked any questions before, answer them before asking for more information. When answering, make sure to provide clear and concise information, with relevant examples.\"\n",
    "        \n",
    "    existing_needs = state.get(\"user_needs\", \"\")\n",
    "    if existing_needs:\n",
    "        system_message += f\" Here's what we know about the needs of the user so far:\\n\\n{existing_needs}\"\n",
    "\n",
    "    messages.append(SystemMessage(content=system_message))\n",
    "\n",
    "    # Get message from the LLM\n",
    "    response = GPT.invoke(messages).content\n",
    "    messages += [AIMessage(response)]\n",
    "    show_assistant_output(f\"\\033[92m{messages[-1].content}\\033[0m\", flush=True)\n",
    "    \n",
    "    messages += [HumanMessage(get_user_input(response))]\n",
    "    print(f\"\\033[94m{messages[-1].content}\\033[0m\", flush=True)\n",
    "    \n",
    "    summarization_messages = messages.copy()\n",
    "    \n",
    "    summarization_messages += [\n",
    "        SystemMessage(\n",
    "            \"Summarize the user's car-buying needs in clear and concise bullet points based on their input and any prior knowledge.\\n\"\n",
    "            \"Provide the next step, such as asking for more details or answer questions under ask_user_needs or going forward to build_filter:\\n\"\n",
    "            \"- Use 'ask_user_needs' if you need more information or if the user asked a question.\\n\"\n",
    "            \"- Use 'build_filters' if you have enough details to search for cars online.\\n\"\n",
    "            \"If the user's query is irrelevant to the matter at hand (buying a car), respond 'irrelevant'.\"\n",
    "        )\n",
    "    ]\n",
    "    \n",
    "    response = json.loads(USER_NEEDS_GPT.invoke(summarization_messages).content)\n",
    "\n",
    "    state[\"user_needs\"] = response[\"user_needs\"]\n",
    "    \n",
    "    messages += [AIMessage(\"I have summarized your car-buying needs as follows:\\n\" + state[\"user_needs\"])]\n",
    "    \n",
    "    show_assistant_output(f\"\\033[92m{messages[-1].content}\\033[0m\")\n",
    "    \n",
    "    state[\"next_node\"] = response[\"next_step\"]\n",
    "        \n",
    "    print(f\"\\nNext node: {state['next_node']}\", flush=True)\n",
    "\n",
    "    return state"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Define the `build_filters` Function\n",
    "\n",
    "This function constructs and refines search filters based on user-provided requirements. It interacts with web scraper interfaces to tailor the search for relevant car listings. Key elements include:\n",
    "\n",
    "1. **Initialization**:\n",
    "   - Displays a message indicating the filter-building process has started.\n",
    "\n",
    "2. **Iterating Over Web Interfaces**:\n",
    "   - Loops through each scraper in `state[\"web_interfaces\"]` to gather and apply filter options.\n",
    "\n",
    "3. **Filter Information**:\n",
    "   - Retrieves filter details from each interface using `get_filters_info()` and incorporates them with the user's needs.\n",
    "\n",
    "4. **LLM-Assisted Filter Application**:\n",
    "   - Sends the filter details and user needs to the LLM (`GPT`) for processing.\n",
    "   - Parses and applies the LLM's response to the interface using `set_filters_from_llm_response()`.\n",
    "\n",
    "5. **Error Handling**:\n",
    "   - Catches and displays errors (e.g., validation issues or unexpected exceptions) for each interface, ensuring robustness.\n",
    "\n",
    "6. **Output**:\n",
    "   - Provides success or failure messages for each interface and displays the updated search URL when filters are successfully applied.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def build_filters(state: State) -> State:\n",
    "    \"\"\"Build and refine search filters based on user needs.\"\"\"\n",
    "\n",
    "    show_assistant_output(\"Building filters based on user needs...\")\n",
    "    \n",
    "    for interface in state[\"web_interfaces\"]:\n",
    "        filters_info = interface.get_filters_info()\n",
    "        \n",
    "        # TODO: Check if this website is useful to the user based on the filters\n",
    "        # If not continue to the next interface\n",
    "        \n",
    "        # If the website is useful, use LLM to setup the filters based on user needs\n",
    "        \n",
    "        # Define system instructions with filters information\n",
    "        system_message = SystemMessage(filters_info + \"\\n\\n\" + \"User needs:\\n\" + state[\"user_needs\"])\n",
    "\n",
    "        # Use the LLM to process the user's needs and set the filters\n",
    "        try:\n",
    "            result = GPT.invoke([system_message])\n",
    "            llm_response = result.content.strip()\n",
    "\n",
    "            # Validate and set the filters for the interface\n",
    "            interface.set_filters_from_llm_response(llm_response)\n",
    "            show_assistant_output(f\"\\nSuccessfully set filters for: {interface.__class__.__name__}\")\n",
    "            show_assistant_output(f\"Updated URL: {interface.url}\")\n",
    "        except ValueError as e:\n",
    "            show_assistant_output(f\"Failed to set filters for {interface.base_url}: {e}\")\n",
    "        except Exception as e:\n",
    "            show_assistant_output(f\"An error occurred while processing filters for {interface.base_url}: {e}\")\n",
    "    \n",
    "    return"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Define the `fetch_listings_from_sources` Function\n",
    "\n",
    "This asynchronous function retrieves car listings from various web interfaces based on the applied filters. Key aspects include:\n",
    "\n",
    "1. **Purpose**:\n",
    "   - Simulates the process of fetching car listings, tailored to the filters defined earlier.\n",
    "\n",
    "2. **Input**:\n",
    "   - `web_interfaces`: A list of web scraper interfaces (e.g., AutoTrader) that implement the `crawl` method for data retrieval.\n",
    "\n",
    "3. **Operation**:\n",
    "   - Iterates over each interface in `web_interfaces` and asynchronously collects listings using `await interface.crawl()`.\n",
    "\n",
    "4. **Output**:\n",
    "   - Returns a consolidated list of dictionaries, where each dictionary represents a car listing"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "async def fetch_listings_from_sources(web_interfaces: List[WebsiteInterface]) -> List[Dict[str, str]]:\n",
    "    \"\"\"Simulate retrieval of car listings from Autotrader.com based on filters.\n",
    "    \n",
    "    Args:\n",
    "        filters (dict): Dictionary containing search filters (e.g., budget, fuel type).\n",
    "        \n",
    "    Returns:\n",
    "        list: A list of dictionaries, each representing a car listing.\n",
    "    \"\"\"\n",
    "    listings = []\n",
    "    for interface in web_interfaces:\n",
    "        listings += await interface.crawl()\n",
    "        \n",
    "    return listings"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Define the `search_listings` Function\n",
    "\n",
    "This function searches for car listings based on the user's needs and displays the most relevant results. It incorporates user feedback to determine the next step in the workflow.\n",
    "\n",
    "1. **Initial Setup**:\n",
    "   - Adds a system message to indicate the start of the search.\n",
    "   - Calls `fetch_listings_from_sources` asynchronously to retrieve listings from the web interfaces.\n",
    "\n",
    "2. **Listing Retrieval**:\n",
    "   - Uses `asyncio.run()` to fetch listings, stores them in `state[\"listings\"]`, and outputs the total count retrieved.\n",
    "\n",
    "3. **Display Listings**:\n",
    "   - Constructs a user-friendly list of the top 5 results, including images, titles, and other details.\n",
    "   - Prompts the user to select a listing, refine the search, or end the conversation.\n",
    "\n",
    "4. **User Interaction**:\n",
    "   - Captures the user's response via the `CLASSIFIER_GPT` model, which categorizes the action (`select_listing`, `refine_search`, or `end_conversation`).\n",
    "   - Updates the `state[\"next_node\"]` based on the user's choice:\n",
    "     - `select_listing`: Prepares for detailed exploration of a specific listing.\n",
    "     - `refine_search`: Returns to the user needs stage.\n",
    "     - `end_conversation`: Terminates the workflow."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from typing import Literal, Optional\n",
    "\n",
    "class UserResponse(BaseModel):\n",
    "    action: Literal['select_listing', 'refine_search', 'end_conversation']\n",
    "    listing_id: Optional[str]\n",
    "\n",
    "CLASSIFIER_GPT = ChatOpenAI(model=\"gpt-4o-mini\", response_format=UserResponse)\n",
    "\n",
    "def search_listings(state: State) -> State:\n",
    "    \"\"\"Search for cars on LaCentrale and mobile.de based on filters.\"\"\"\n",
    "    \"\"\"Display the first listings for the user to view.\"\"\"\n",
    "    \"\"\"Synchronous wrapper for search_listings.\"\"\"\n",
    "\n",
    "    state[\"messages\"] += [SystemMessage(\"Searching for listings based on user needs, this may take time...\")]\n",
    "    show_assistant_output(state[\"messages\"][-1].content)\n",
    "\n",
    "    async def _search_listings():\n",
    "        return await fetch_listings_from_sources(state[\"web_interfaces\"])\n",
    "    \n",
    "    listings = asyncio.run(_search_listings())\n",
    "    state[\"listings\"] = listings\n",
    "    \n",
    "    show_assistant_output(f\"Successfully fetched {len(listings)} listings from the sources.\")\n",
    "    \n",
    "    AI_message = \"\"\n",
    "    \n",
    "    # Display the first few listings for the user to view\n",
    "    AI_message += \"Here are recent listings that match your requirements:\\n\"\n",
    "    for i, listing in enumerate(state[\"listings\"][:5], 1):\n",
    "        AI_message += f\"{i}.\\n\"\n",
    "        for key, value in listing.items():\n",
    "            formatted_key = key.replace(\"_\", \" \").capitalize()\n",
    "            if formatted_key == \"Image\" and value:\n",
    "                AI_message += f\"   {formatted_key}: ![Example Image]({value})\\n\"\n",
    "            else:\n",
    "                AI_message += f\"   {formatted_key}: {value}\\n\"\n",
    "        AI_message += \"\\n\"  # Add an extra line for readability\n",
    "    \n",
    "    user_prompt = \"Would you like to view more details about a specific listing, or refine your search (Write END to finish this conversation) ?\"\n",
    "    AI_message += user_prompt\n",
    "        \n",
    "    state[\"messages\"].append(AIMessage(AI_message))\n",
    "    show_assistant_output(f\"\\033[92m{state['messages'][-1].content}\\033[0m\")\n",
    "    state[\"messages\"].append(HumanMessage(get_user_input(user_prompt)))\n",
    "    print(f\"\\033[94m{state['messages'][-1].content}\\033[0m\")\n",
    "       \n",
    "    response = json.loads(CLASSIFIER_GPT.invoke(state[\"messages\"]).content)\n",
    "\n",
    "    if response[\"action\"] == \"select_listing\":\n",
    "        state[\"next_node\"] = \"fetch_additional_info\"\n",
    "        selected_listing_id = response[\"listing_id\"]\n",
    "        for i, listing in enumerate(state[\"listings\"][:5], 1):\n",
    "            if selected_listing_id in listing[\"id\"]:\n",
    "                state[\"selected_listing\"] = listing\n",
    "                break\n",
    "    elif response[\"action\"] == \"refine_search\":\n",
    "        state[\"next_node\"] = \"ask_user_needs\"\n",
    "    else:\n",
    "        state[\"next_node\"] = END\n",
    "        \n",
    "    return state"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Define the `fetch_additional_info` Function\n",
    "\n",
    "This function retrieves detailed information about a selected car listing, enhances it with insights from the web, and allows the user to decide the next steps.\n",
    "\n",
    "1. **Crawl the Car Listing**:\n",
    "   - Asynchronously fetches additional details about the selected car from its source URL using the appropriate scraper (`crawl_listing`).\n",
    "\n",
    "2. **Summarize Car Details**:\n",
    "   - Uses the LLM (`GPT`) to generate a clear and concise summary of the car's details, formatted for readability.\n",
    "\n",
    "3. **Fetch Web-Based Insights**:\n",
    "   - Queries DuckDuckGo for information about the car model (e.g., common issues, reliability) and formats the results.\n",
    "\n",
    "4. **Enhance Context with LLM**:\n",
    "   - Combines the fetched insights with user needs to generate a comprehensive summary of the car's specifications and general issues.\n",
    "\n",
    "5. **User Interaction**:\n",
    "   - Displays the additional information to the user and prompts them to either:\n",
    "     - View details of another listing (`fetch_additional_info`).\n",
    "     - Refine their search (`ask_user_needs`).\n",
    "     - End the conversation.\n",
    "\n",
    "6. **Workflow Updates**:\n",
    "   - Updates `state[\"next_node\"]` based on the user's action, ensuring a smooth transition to the next step.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_community.tools import DuckDuckGoSearchResults\n",
    "\n",
    "duckduckgo_search = DuckDuckGoSearchResults(max_results=3)\n",
    "\n",
    "def fetch_additional_info(state: State) -> State:\n",
    "    \"\"\"Fetch more details about the selected car listing.\"\"\"\n",
    "    listing = state[\"selected_listing\"]\n",
    "\n",
    "    # Crawl the car listing page to get more details about the car for sale and the seller\n",
    "\n",
    "    async def _crawl_car_listing():\n",
    "        for interface in state[\"web_interfaces\"]:\n",
    "            if listing[\"id\"].split(\"_\")[0].lower() in interface.__class__.__name__.lower():\n",
    "                return await interface.crawl_listing(listing[\"url\"])\n",
    "    \n",
    "    info_car_for_sale = asyncio.run(_crawl_car_listing())\n",
    "\n",
    "    # Call the LLM to summarize the information about the car for sale into a concise paragraph\n",
    "    prompt = SystemMessage(\n",
    "        f\"Summarize all the relevant information about the selected car for sale into a paragraph: {listing['title']}\\n\\n\"\n",
    "        f\"Here is the raw information about the car for sale:\\n\\n{info_car_for_sale}\"\n",
    "        f\"Format the summary clearly and concisely, with line breaks between sections.\"\n",
    "    )\n",
    "\n",
    "    car_info_summary = GPT.invoke([prompt]).content\n",
    "\n",
    "    show_assistant_output(\"\\033[92mHere are more details about the car for sale:\\n\\033[0m\", flush=True)\n",
    "\n",
    "    show_assistant_output(\"\\033[92m\" + car_info_summary + \"\\n\\n\\033[0m\", flush=True)\n",
    "\n",
    "    state[\"messages\"] += [prompt, AIMessage(car_info_summary)]\n",
    "\n",
    "    # Search for common issues and reliability of the car on DuckDuckGo\n",
    "    car_name = listing[\"title\"]\n",
    "\n",
    "    queries = [f\"{car_name} common issues\", f\"{car_name} problem\", f\"{car_name} reliability\"]\n",
    "    context = \"\"\n",
    "    for query in queries:\n",
    "        search_results = duckduckgo_search.invoke(query)\n",
    "        formatted_results = f\"QUERY: {query}\\n\\n{search_results}\\n-------------------\\n\"\n",
    "        context += formatted_results\n",
    "\n",
    "    prompt = SystemMessage(\n",
    "        f\"Provide additional information about this car: {listing['title']}, \"\n",
    "        f\"including engine specifications, common issues with this model, and market value.\"\n",
    "        f\"Here is additioanl context to help you provide the information:\\n\\n{context}\"\n",
    "        f\"Here are the user needs, give some insights about the car based on the user needs:\\n\\n{state['user_needs']}\"\n",
    "    )\n",
    "    \n",
    "    result = GPT.invoke([prompt])\n",
    "    \n",
    "    listing[\"additional_info\"] = result.content\n",
    "    \n",
    "    show_assistant_output(f\"\\033[92mHere is additional information about the model in general, coming from Internet:\\n{listing['additional_info']}\\n\\033[0m\")\n",
    "    \n",
    "    user_prompt = \"Would you like to view more details about another listing, or refine your search (Write END to finish this conversation) ?\"\n",
    "    state[\"messages\"] += [SystemMessage(user_prompt)]\n",
    "    state[\"messages\"] += [HumanMessage(get_user_input(user_prompt))]\n",
    "    print(f\"\\033[94m{state['messages'][-1].content}\\033[0m\", flush=True)\n",
    "    \n",
    "    response = json.loads(CLASSIFIER_GPT.invoke(state[\"messages\"]).content)\n",
    "\n",
    "    if response[\"action\"] == \"select_listing\":\n",
    "        state[\"next_node\"] = \"fetch_additional_info\"\n",
    "        selected_listing_id = response[\"listing_id\"]\n",
    "        for i, listing in enumerate(state[\"listings\"][:5], 1):\n",
    "            if selected_listing_id in listing[\"id\"]:\n",
    "                state[\"selected_listing\"] = listing\n",
    "                break\n",
    "    elif response[\"action\"] == \"refine_search\":\n",
    "        state[\"next_node\"] = \"ask_user_needs\"\n",
    "    else:\n",
    "        state[\"next_node\"] = END\n",
    "    \n",
    "    return state"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Initialize and Define the Workflow Graph\n",
    "\n",
    "This cell sets up the state-based workflow using `StateGraph` from `langgraph`. It defines the nodes, edges, and conditional logic for navigating through the car-buying assistant's process.\n",
    "\n",
    "1. **Initialize the Workflow**:\n",
    "   - The `StateGraph` object is created with the `State` class to manage the workflow state.\n",
    "\n",
    "2. **Define Nodes**:\n",
    "   - Each step in the workflow is represented as a node:\n",
    "     - `ask_user_needs`: Gathers user requirements.\n",
    "     - `build_filters`: Constructs search filters.\n",
    "     - `search_listings`: Retrieves car listings.\n",
    "     - `fetch_additional_info`: Provides detailed information about a selected car.\n",
    "     - `irrelevant`: Handles unrelated queries.\n",
    "\n",
    "3. **Set Workflow Edges**:\n",
    "   - Nodes are connected based on the possible transitions:\n",
    "     - Conditional transitions from `ask_user_needs` depend on the next step (`build_filters`, `ask_user_needs`, or `irrelevant`).\n",
    "     - `build_filters` transitions directly to `search_listings`.\n",
    "     - Conditional transitions from `search_listings` determine whether to fetch more details, return to user needs, or end the workflow.\n",
    "     - The `irrelevant` node ends the workflow.\n",
    "\n",
    "4. **Entry and Exit Points**:\n",
    "   - The workflow begins at the `ask_user_needs` node.\n",
    "   - Conditional edges from `fetch_additional_info` allow for revisiting user needs, exploring more details, or ending the workflow.\n",
    "\n",
    "5. **Compile the Workflow**:\n",
    "   - The workflow is compiled into an executable application (`app`), ready to process user queries.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Initialize the StateGraph\n",
    "workflow = StateGraph(State)\n",
    "\n",
    "# Define the nodes in the graph\n",
    "workflow.add_node(\"ask_user_needs\", ask_user_needs)\n",
    "workflow.add_node(\"build_filters\", build_filters)\n",
    "workflow.add_node(\"search_listings\", search_listings)\n",
    "workflow.add_node(\"fetch_additional_info\", fetch_additional_info)\n",
    "workflow.add_node(\"irrelevant\", lambda state: state)\n",
    "\n",
    "# Define edges\n",
    "workflow.add_conditional_edges(\"ask_user_needs\", lambda state: state[\"next_node\"], [\"build_filters\", \"ask_user_needs\", \"irrelevant\"])\n",
    "workflow.add_edge(\"build_filters\", \"search_listings\")\n",
    "workflow.add_conditional_edges(\"search_listings\", lambda state: state[\"next_node\"], [\"fetch_additional_info\", \"ask_user_needs\", END])\n",
    "workflow.add_edge(\"irrelevant\", END)\n",
    "\n",
    "# Set the entry and exit points\n",
    "workflow.set_entry_point(\"ask_user_needs\")\n",
    "workflow.add_conditional_edges(\"fetch_additional_info\", lambda state: state[\"next_node\"], [\"ask_user_needs\", \"fetch_additional_info\", END])\n",
    "\n",
    "\n",
    "# Compile the workflow\n",
    "app = workflow.compile()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Visualize the Workflow Graph\n",
    "\n",
    "This cell generates and displays a visual representation of the workflow using Mermaid.js. The graph shows the nodes and their connections, providing a clear overview of the assistant's structure.\n",
    "\n",
    "1. **Generate Graph**:\n",
    "   - The `draw_mermaid_png` method from `MermaidDrawMethod.API` creates a PNG image of the workflow graph.\n",
    "\n",
    "2. **Display Graph**:\n",
    "   - The `Image` function renders the generated PNG, enabling visualization of the nodes (states) and edges (transitions).\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "display(\n",
    "    Image(\n",
    "        app.get_graph().draw_mermaid_png(\n",
    "            draw_method=MermaidDrawMethod.API,\n",
    "        )\n",
    "    )\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Define `run_car_buyer_agent` Function\n",
    "\n",
    "This function initializes and executes the car-buying assistant workflow using the compiled LangGraph application.\n",
    "\n",
    "1. **Initialization**:\n",
    "   - Creates an empty `messages` list to store the conversation history.\n",
    "\n",
    "2. **Set Initial State**:\n",
    "   - Defines the `initial_state` object with the following:\n",
    "     - `user_needs`: Empty, awaiting user input.\n",
    "     - `web_interfaces`: A list containing the `AutotraderInterface` for scraping car listings.\n",
    "     - `listings`, `selected_listing`, `additional_info`: Empty placeholders to be populated during the workflow.\n",
    "     - `next_node`: Empty, to be updated dynamically.\n",
    "     - `messages`: Tracks messages exchanged between the assistant and the user.\n",
    "\n",
    "3. **Run Workflow**:\n",
    "   - Invokes the workflow using `app.invoke()` with the initialized state.\n",
    "\n",
    "4. **Output**:\n",
    "   - Returns the final `result` after executing the workflow.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Verify initial setup and function invocation\n",
    "def run_car_buyer_agent():\n",
    "    \"\"\"Run the car-buying assistant with LangGraph.\"\"\"\n",
    "        \n",
    "    messages = []\n",
    "    \n",
    "    initial_state = State(\n",
    "        user_needs={}, \n",
    "        web_interfaces=[AutotraderInterface()], \n",
    "        listings=[],\n",
    "        selected_listing={}, \n",
    "        additional_info={},\n",
    "        next_node=\"\",\n",
    "        messages=messages\n",
    "    )\n",
    "    result = app.invoke(initial_state)\n",
    "    return result"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Conditional Execution Without Gradio\n",
    "\n",
    "This cell provides a fallback mechanism to run the car-buying assistant in a command-line environment if the `USE_GRADIO` flag is set to `False`.\n",
    "\n",
    "1. **Define Input/Output Functions**:\n",
    "   - `get_user_input`: Captures user input via the terminal using `input()`.\n",
    "   - `show_assistant_output`: Displays the assistant's output using `print()`.\n",
    "\n",
    "2. **Execute the Agent**:\n",
    "   - Calls the `run_car_buyer_agent()` function to initiate the workflow and stores the result in `car_buyer_result`.\n",
    "\n",
    "3. **Debugging Output**:\n",
    "   - Prints the raw result of the workflow execution for inspection.\n",
    "\n",
    "4. **Display Final Recommendation**:\n",
    "   - If a car listing is selected:\n",
    "     - Outputs the title, price, and mileage of the recommended car.\n",
    "     - Prints additional details retrieved during the workflow.\n",
    "   - If no listing is selected, informs the user.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "USE_GRADIO = True\n",
    "\n",
    "if not USE_GRADIO:\n",
    "    def get_user_input(*args, **kwargs):\n",
    "        \"\"\"Get user input.\"\"\"\n",
    "        return input(*args, **kwargs)\n",
    "\n",
    "    def show_assistant_output(*args, **kwargs):\n",
    "        \"\"\"Show the output of the LLM.\"\"\"\n",
    "        print(*args, **kwargs)\n",
    "\n",
    "    # Execute the agent\n",
    "    car_buyer_result = run_car_buyer_agent()\n",
    "\n",
    "    # Print result for debugging purposes\n",
    "    print(\"Car Buyer Result:\", car_buyer_result)\n",
    "\n",
    "    # Display summary of the final recommendation\n",
    "    if \"selected_listing\" in car_buyer_result:\n",
    "        listing = car_buyer_result[\"selected_listing\"]\n",
    "        print(f\"\\nFinal Recommendation:\\n{listing['title']} - {listing['price']} - {listing['mileage']} km\")\n",
    "        print(\"Additional Information:\")\n",
    "        for key, value in car_buyer_result[\"additional_info\"].items():\n",
    "            print(f\"{key}: {value}\")\n",
    "    else:\n",
    "        print(\"No car listing selected.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Gradio Interface and Threaded Execution\n",
    "\n",
    "This cell sets up a **Gradio-based chatbot interface** for the car-buying assistant, enabling user interaction via a web-based GUI.\n",
    "\n",
    "#### Key Components:\n",
    "\n",
    "1. **Input and Output Queues**:\n",
    "   - `InputQueue`:\n",
    "     - Simulates `stdin` behavior by queuing user inputs for asynchronous processing.\n",
    "   - `output_queue`:\n",
    "     - Stores the assistant's responses for incremental display in the UI.\n",
    "\n",
    "2. **Input/Output Functions**:\n",
    "   - `get_user_input`:\n",
    "     - Waits for and retrieves user input from the `input_queue`.\n",
    "   - `show_assistant_output`:\n",
    "     - Formats and sends assistant responses to the `output_queue`.\n",
    "\n",
    "3. **Agent Interaction**:\n",
    "   - `interact_with_agent`:\n",
    "     - Processes user messages, sends them to the agent, and retrieves responses incrementally for a seamless conversational flow.\n",
    "   - `get_initial_message`:\n",
    "     - Captures the initial response from the agent to display upon starting the interface.\n",
    "\n",
    "4. **Threaded Execution**:\n",
    "   - `run_langgraph_agent`:\n",
    "     - Executes the LangGraph-based workflow in a separate thread to allow non-blocking interaction in the Gradio interface.\n",
    "\n",
    "5. **Gradio Interface**:\n",
    "   - `gr.ChatInterface`:\n",
    "     - Creates a real-time chat interface with the following configurations:\n",
    "       - `interact_with_agent`: Handles message exchanges.\n",
    "       - `chatbot`: Configures the chat window's appearance and behavior.\n",
    "\n",
    "6. **Execution Flow**:\n",
    "   - If `USE_GRADIO` is `True`:\n",
    "     - Starts the agent in a separate thread.\n",
    "     - Initializes the Gradio interface with the assistant's initial message and launches it.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import threading\n",
    "import queue\n",
    "import gradio as gr\n",
    "from gradio import ChatMessage\n",
    "import time\n",
    "import re\n",
    "\n",
    "waiting_for_input = False # Flag to indicate if the agent is waiting for user input\n",
    "\n",
    "class InputQueue:\n",
    "    \"\"\"A custom input queue that mimics stdin behavior.\"\"\"\n",
    "    def __init__(self):\n",
    "        self.queue = queue.Queue()\n",
    "\n",
    "    def readline(self):\n",
    "        \"\"\"Mimic the readline behavior of stdin.\"\"\"\n",
    "        try:\n",
    "            r = self.queue.get(block=True)  # Wait until input is available\n",
    "            return r\n",
    "        except queue.Empty:\n",
    "            return \"\"\n",
    "\n",
    "    def write(self, message):\n",
    "        \"\"\"Handle writes if needed (for debugging).\"\"\"\n",
    "        pass\n",
    "\n",
    "    def flush(self):\n",
    "        \"\"\"No-op for compatibility.\"\"\"\n",
    "        pass\n",
    "\n",
    "    def put(self, message):\n",
    "        \"\"\"Put a message into the queue.\"\"\"\n",
    "        self.queue.put(message)\n",
    "\n",
    "# A thread-safe queue for communication\n",
    "output_queue = queue.Queue()\n",
    "# Replace sys.stdin with the custom InputQueue\n",
    "input_queue = queue.Queue()\n",
    "\n",
    "def get_user_input(*args, **kwargs):\n",
    "    \"\"\"Get user input.\"\"\"\n",
    "    global waiting_for_input\n",
    "    print(\"Waiting for user input...\")\n",
    "    waiting_for_input = True\n",
    "    r = input_queue.get()\n",
    "    waiting_for_input = False\n",
    "    print(f\"Received user input\")\n",
    "    return r\n",
    "\n",
    "def show_assistant_output(*args, **kwargs):\n",
    "    \"\"\"Show the output of the LLM.\"\"\"\n",
    "    \n",
    "    result = \" \".join(args) + kwargs.get(\"end\", \"\\n\")\n",
    "    \n",
    "    # Replace any Color Codes with Regex\n",
    "    result = re.sub(r'\\033\\[\\d+m', '', result)\n",
    "    # result = result.replace(\"\\033[92m\", \"\").replace(\"\\033[0m\", \"\").replace(\"\\033[94m\", \"\")\n",
    "    \n",
    "    output_queue.put(result)\n",
    "\n",
    "# Gradio UI Functionality\n",
    "def interact_with_agent(user_message, history, discard_user_input=False):\n",
    "    \"\"\"Send user message to the bot and handle the response.\"\"\"\n",
    "    \n",
    "    global waiting_for_input\n",
    "    \n",
    "    if not discard_user_input:\n",
    "        input_queue.put(user_message + \"\\n\")  # Send user input to LangGraph\n",
    "        \n",
    "    partial_message = \"\"\n",
    "\n",
    "    # Fetch and yield bot responses incrementally\n",
    "    while True:\n",
    "        try:\n",
    "            message = output_queue.get(timeout=0.1)  # Wait for bot output\n",
    "            if message:\n",
    "                    \n",
    "                partial_message += message\n",
    "                yield partial_message\n",
    "        except queue.Empty:\n",
    "            is_end = waiting_for_input\n",
    "            if is_end:\n",
    "                break\n",
    "            time.sleep(0.1)\n",
    "            \n",
    "def get_initial_message():\n",
    "    \"\"\"Run the agent and capture the initial message.\"\"\"\n",
    "    # Simulate an initial empty input to get the initial message\n",
    "    initial_message = \"\"\n",
    "    \n",
    "    for message in interact_with_agent(\"\", [], True):  # Consume the generator to get the full initial message\n",
    "        initial_message = message  # Keep updating until the generator finishes\n",
    "\n",
    "    return initial_message\n",
    "\n",
    "initial_message_content = \"\"\n",
    "\n",
    "def run_langgraph_agent():\n",
    "    \"\"\"Run the LangGraph agent and redirect its stdout.\"\"\"\n",
    "    global initial_message_content\n",
    "    run_car_buyer_agent()  # Start the LangGraph workflow\n",
    "\n",
    "\n",
    "if USE_GRADIO:\n",
    "    # Run the agent in a separate thread\n",
    "    agent_thread = threading.Thread(target=run_langgraph_agent, daemon=True)\n",
    "    agent_thread.start()\n",
    "\n",
    "    initial_message_content = get_initial_message()\n",
    "\n",
    "    initial_messages = [{\"role\": \"assistant\", \"content\": initial_message_content}]\n",
    "\n",
    "    chat = gr.ChatInterface(interact_with_agent,\n",
    "                    chatbot=gr.Chatbot(label=\"Car Buyer Chatbot\", autoscroll=True, scale=1, value=initial_messages, type=\"messages\", height=200),\n",
    "                    type=\"messages\").launch()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "main",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
